Skip to content

DPL: improve resource management#2513

Merged
ktf merged 1 commit intoAliceO2Group:devfrom
ktf:improved-resource-allocation
Oct 21, 2019
Merged

DPL: improve resource management#2513
ktf merged 1 commit intoAliceO2Group:devfrom
ktf:improved-resource-allocation

Conversation

@ktf
Copy link
Copy Markdown
Member

@ktf ktf commented Oct 18, 2019

Rework the way DPL assign resources to devices. This is a first
step in moving away from the hardcoded 127.0.0.1 ip address and
allows DPL workflows to run distributed without a third party
configuring the channels.

A further optimization will also allow DPL to configure ipc / shared memory when two devices happen to be assigned to the same machine.

  • Allow setting the available resources from the command line.
  • Only spawn devices which have a resource assigned which match the local hostname.
  • If two devices are known to be on the same host, use ipc rather than tcp.

@ktf ktf requested a review from a team as a code owner October 18, 2019 22:38
@ktf ktf changed the title DPL: improve resource management [WIP]: DPL: improve resource management Oct 18, 2019
@ktf
Copy link
Copy Markdown
Member Author

ktf commented Oct 18, 2019

@aalkin

@ktf ktf force-pushed the improved-resource-allocation branch 2 times, most recently from 7c0726b to 5d0a219 Compare October 19, 2019 00:31
@ktf ktf changed the title [WIP]: DPL: improve resource management DPL: improve resource management Oct 19, 2019
@ktf
Copy link
Copy Markdown
Member Author

ktf commented Oct 19, 2019

@aalkin this now seems to compile and work.
@knopers8 @matthiasrichter @sawenzel @shahor02 could you verify this does not break anything in your workflows?

This is meant to pave the way for shared memory usage (via DPL driver) and static multiple-host support.

@ktf ktf force-pushed the improved-resource-allocation branch from 5d0a219 to ed6fde6 Compare October 19, 2019 08:04
@shahor02
Copy link
Copy Markdown
Collaborator

@ktf I've tested in heaviest reco flows, they are not affected.

@aalkin
Copy link
Copy Markdown
Member

aalkin commented Oct 20, 2019

@ktf it seems the test_DeviceSpec and test_Graphviz both fail due to random int in place of a port number, most likely because we are not setting port and range in DefaultOffer and ComputingOffer struct itself has no default initialization.

Rework the way DPL assign resources to devices. This is a first
step in moving away from the hardcoded 127.0.0.1 ip address and
allows DPL workflows to run distributed without a third party
configuring the channels. A further optimization will also allow
DPL to configure ipc / shared memory when two devices happen to
be assigned to the same machine.
@ktf ktf force-pushed the improved-resource-allocation branch from ed6fde6 to 40a79c1 Compare October 21, 2019 07:33
@knopers8
Copy link
Copy Markdown
Collaborator

QC seems to be doing fine.

@ktf
Copy link
Copy Markdown
Member Author

ktf commented Oct 21, 2019

Ok. I am merging this and I open a new one which optimises local connections to "ipc" (i.e. unix sockets). We can then try with "inproc" (i.e. managed shared memory) but we cannot get it to work on my mac, yet.

@ktf ktf merged commit eb4eea9 into AliceO2Group:dev Oct 21, 2019
@ktf ktf deleted the improved-resource-allocation branch October 21, 2019 09:53
@ktf
Copy link
Copy Markdown
Member Author

ktf commented Oct 21, 2019

Sorry, I meant ipc+shmem. This is now done in #2517.

knopers8 pushed a commit to knopers8/AliceO2 that referenced this pull request Oct 23, 2019
Rework the way DPL assign resources to devices. This is a first
step in moving away from the hardcoded 127.0.0.1 ip address and
allows DPL workflows to run distributed without a third party
configuring the channels. A further optimization will also allow
DPL to configure ipc / shared memory when two devices happen to
be assigned to the same machine.
carlos-soncco pushed a commit to carlos-soncco/AliceO2 that referenced this pull request Oct 28, 2019
Rework the way DPL assign resources to devices. This is a first
step in moving away from the hardcoded 127.0.0.1 ip address and
allows DPL workflows to run distributed without a third party
configuring the channels. A further optimization will also allow
DPL to configure ipc / shared memory when two devices happen to
be assigned to the same machine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants